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^ ■ An estimation method is proposed for a wide variety of discrete 

_ O ' time stochastic processes that have an intractable hkeUhood func- 

' ^ , , tion but are otherwise conveniently specified by an integral trans- 

form such as the characteristic function, the Laplace transform or 
the probability generating function. This method involves the con- 
struction of classes of transform-based martingale estimating func- 
tions that fit into the general framework of quasi-likelihood. In the 
^H I parametric setting of a discrete time stochastic process, we obtain 

^^ . transform quasi-score functions by projecting the unavailable score 

function onto the special linear spaces formed by these classes. The 
specification of the process by any of the main integral transforms 
Cy ■ makes possible an arbitrarily close approximation of the score func- 

tion in an infinite-dimensional Hilbert space by optimally combining 
transform martingale quasi-score functions. It also allows an exten- 
sion of the domain of application of quasi-likelihood methodology to 
1.^^ ^ processes with infinite conditional second moment. 

tr^ ■ 1. Introduction. Maximum likelihood estimation of parameters of dis- 

crete time stochastic processes is often not feasible because an explicit ex- 
pression for the associated likelihood function is either unavailable or too 
complicated. In a wide variety of situations, however, a description of the 
f~^ I process by an integral transform, such as the conditional characteristic func- 

^D ■ tion or the conditional Laplace transform, is more readily available than ex- 

plicit likelihood or score functions. A broad array of such processes encoun- 
(«j ■ tered in the literature includes the following. First, there are linear processes 

'^ . with infinite variance used in modeling certain time series phenomena. In 

C^ I particular, models in economics and in signal processing involving linear 

time series with error having a stable distribution have been considered in 
the literature; see, for example, McCulloch [25] and Nikias and Shao [30]. 
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2 T. MERKOURIS 

In general, no closed form expression suitable for likelihood methods exists 
for the densities of such processes, but a simple form of characteristic func- 
tion or Laplace transform is available. Second, there are two rich classes 
of discrete and non-Gaussian continuous variate time series models. One of 
these classes includes models with exponential or gamma marginal distribu- 
tions, which are of importance in queuing and network processes. The other 
class includes stationary processes with discrete marginal distributions, such 
as Poisson, geometric or negative binomial, which are useful for modeling 
counting processes consisting of dependent random variables. A detailed sur- 
vey of first-order autoregressive processes with such distributions is given in 
Grunwald, Hyndman, Tedesco and Tweedie [17]. For the higher-order au- 
toregressive case, see Billard and Mohamed [3], Alzaid and Al-Osh [2] and 
references therein. These non-Gaussian models have an intractable likelihood 
function because of its complexity or inherent discontinuities. Nonetheless, 
they are described handily by their Laplace transforms or probability gener- 
ating functions. Third, there are aggregate models, such as aggregate Markov 
chains and the related compartmental models used in many areas, such as 
economics, population theory and the social sciences, when only aggregate 
data are available; see, for example, McLeish [26] and Leitnaker [24]. Such 
models involve convolutions whose densities are rarely tractable but have a 
simple representation by their probability generating functions. 

In all these examples maximum likelihood estimation for the parameters 
of the relevant models has been regarded as unworkable, and other methods 
of estimation, typically using only the first two moments of the underlying 
distribution, are generally suboptimal. 

In this article we propose an estimation method for discrete time stochas- 
tic processes that are conveniently specified by conditional integral trans- 
forms. The development of this method follows earlier work (Merkouris 
[27, 28]) that introduced quasi- likelihood estimation for discrete time semi- 
martingales based on conditional integral transforms. Though quite distinct, 
this approach is related to previous estimation methods in the literature that 
involved fitting an empirical transform to its theoretical counterpart in the 
setting of i.i.d. random variables. In this sense, it is akin to the general- 
ized moment procedure proposed by Feuerverger and McDunnough [14] and 
Brant [4] as an approximate maximum likelihood procedure. 

In the parametric setting of a discrete time stochastic process with in- 
tractable likelihood function we build classes of martingale estimating func- 
tions by means of an integral transform that specifies the process. Such 
classes of transform-based martingale estimating functions fit into the gen- 
eral quasi- likelihood framework given by Godambe and Heyde [15]. In our 
quasi-likelihood approach, we obtain quasi-score functions as projections of 
the unavailable score function onto the special linear spaces formed by these 



TRANSFORM MARTINGALE ESTIMATING FUNCTIONS 3 

classes. Thus, these transform quasi-score functions provide best Unear ap- 
proximations to the score function in Hilbert space. In contrast to the semi- 
parametric setting of the ordinary quasi-Ukehhood, the specification of the 
process by its transform structure enables the utilization of distributional 
information beyond the second-order moment structure. Enlarged classes of 
suitable composite transform martingales can then be constructed, which 
may lead to an arbitrarily close approximation of the score function. The 
transform structure allows also an extension of the quasi-likelihood method- 
ology to processes with infinite conditional second moment. Furthermore, 
combining transform martingale estimating functions may be effective in 
dealing with problems of identifiability of vector parameters. 

The proposed transform martingale estimating functions are generally 
nonlinear in the observations, and for the main integral transforms they may 
be represented as perturbed polynomial quasi-score functions. In particular, 
a basic transform martingale estimating function can be expressed as a per- 
turbed ordinary quasi-score (weighted conditional least squares) estimating 
function. 

Other estimation methods based on empirical transforms for discrete time 
stochastic processes have appeared in the literature. Feuerverger [11] dis- 
cussed an asymptotically efficient estimation procedure based on the "poly- 
characteristic" function in the setting of univariate stationary time series 
models. Brockwell and Liu [5] used the empirical characteristic function in 
estimation for a linear process with stable innovations. Estimators based on 
the Laplace transform were used for the estimation of parameters of pro- 
gression time distributions in multi-stage models by Schuh and Tweedie [32], 
Feigin, Tweedie and Belyea [10] and Hoeting, Tweedie and Olver [20]. Abra- 
ham and Balakrishna [1] used the empirical Laplace transform to estimate a 
parameter of a first-order inverse Gaussian autoregressive process. Yao and 
Morgan [37] used a least squares approach based on empirical transforms for 
a class of indexed stochastic models. Each of the aforementioned methods 
is essentially an ad hoc approach to the particular estimation problem. In 
contrast, this article presents a general estimation procedure that is statis- 
tically and computationally efficient, of broad applicability and based on a 
comprehensive estimation theory. 

The article is organized as follows. In Section 2 the transform method is 
introduced as a special quasi-likelihood estimation procedure with a poten- 
tial of high efficiency that rests on the capacity of constructing combinations 
of transform martingale estimating functions. A link is established between 
the method of Feuerverger and McDunnough, and Brant, and the quasi- 
likelihood approach through the concept of best linear approximation of 
the score function that underlies both estimation procedures. In Section 3 
the formulation of orthogonal projection of the score function onto suitable 
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infinite-dimensional spaces lays the ground for the construction of poten- 
tially fully efficient transform martingale estimating functions. A practica- 
ble procedure for forming composite transform martingale estimating func- 
tions of nondecreasing efficiency is then developed. Computational issues 
related to the proposed estimating procedure are discussed in Section 4. 
Comparisons of martingale estimating functions based on important trans- 
forms are presented in Section 5. Also in Section 5, the relationship of the 
transform estimation method with ordinary quasi-likelihood and conditional 
least squares is established, and suggestions as to the choice of an efficient 
transform estimating function are made. Two brief examples illustrating 
important features of the transform method are given in Section 6. 

2. Transform martingale estimating functions. 

2.1. The basic structure. Let {Yj, 1 < j < n} be a sample from a dis- 
crete time stochastic process which takes values in r-dimensional Euclidean 
space and whose distribution depends on a parameter 6 belonging to an open 
subset of p-dimensional Euclidean space. Suppose that the possible prob- 
ability measures for {Yj} are {Pe, E 0} and that each (0, JF, Pq) is a com- 
plete probability space. Let J-j denote the past-history sub-c-field of J- gen- 
erated by Yi, . . . , Yj,j > 1. Suppose that the conditional density function 
/^(YjlYi, . . . , Yj„i) and the gradient of log/0(Yj|Yi, . . . , Yj_i), denoted 
by Sj, exist. Allowing differentiation under an integral sign, E{sj\J^j^i) = 
almost surely (a.s.) and, thus, the score function Sin = Yll=i^j defines a 
zero-mean martingale, {S„, J^n}-, which is assumed to be square integrable. 

The score function may be unavailable or too difficult to compute, so that 
estimating by maximum likelihood is not feasible. Nevertheless, a class of 
workable martingale estimating functions may instead be constructed based 
on an integral transform. 

We assume at first, for simplicity of exposition, that we deal with real 
valued random variables Yi, . . . , 1^ whose distribution depends on a scalar 
parameter G G. Then for the jth time point we write 

Fj {y\Fj^i)=P{Yj<y\J'j^i), Fj (y) = lyy^ <j^] , 1 < j < n, 

where / denotes the indicator function. For an indexed set of real or complex 
valued functions {gtiY), t &T Q M}, the kernel class, we consider the integral 
transform 



(1) cjit) = J gtiy)dF,{y\J^j^i) = E{gt{Yj)\J^j^i), 

where the kernel gt{-) is such that the integral exists and is finite for all G 
and all t €T. The dependence of Fj{y\!Fj-i) and Cj{t) on 9 is suppressed no- 
tationally for convenience of writing. The important transforms include the 
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characteristic function, the moment generating function and the probabil- 
ity generating function, with associated kernel classes {sm(tY),cos{tY), t G 
M}, {exp(ty), t G M} and {t^ , t € M}, respectively, though others (e.g., the 
Laplace transform or the sequence of moments) may be used as befits the 
context. 
Now let 



(2) 




cj{t)=J gt{y)dFj{y) 


and write 






(3) hj{t: 


)=Cj{t) 


-c,{t)=gt{Y,)-E{gt{\ 



^i-i), l<j<n. 

Since E{hj{t)\Tj^i) = a.s., for fixed t G T the {hj{t),Tj} are martin- 
gale differences of a zero- mean martingale, say, {Hn{t) =J21'=ihj{t)j^n}, 
to which we associate a class A4t of martingale estimating functions defined 
by 

(4) Mt = lGn{t):Gn{t) =J2^jhj{t), wj = wj{Yi,...,Yj^i,9)\. 
Estimators of 9 can then be found by solving the estimating equations 

Gn{t)=0. 

If we assume that the zero- mean martingale estimating functions G„(t) 
defined by (4) are square integrable and differentiable a.s. with respect to 
9 for each t (zT, then the special class of transform-based martingale esti- 
mating functions (4) fits into the general quasi-likelihood framework given 
by Godambe and Heyde [15]. In this framework, which incorporates essen- 
tial ideas from the methods of least squares and maximum likelihood, a basic 
martingale {Hn = J2]j=i hjjTn}, with hj = hj(Yi, . . . ,Yj,9) and E{hj\Tj-.i) = 
a.s., can be chosen in a variety of ways that give rise to different classes 
of martingale estimating functions as alternatives to the score function. In 
particular, since any discrete time process {Yn,J-n} has the semimartin- 
gale representation Y.]=iYj = J2]=iEiYj\J='j^i) +J2]=ihj, with hj = Yj - 
E{Yj\!Fj-i), a class of martingale estimating functions may be based on the 
martingale J2]=ihj- 

In the present context, where it is assumed that a conditional transform 
Cj{t) = E{gt{Yj)\Tj-i) can be readily obtained (as in the examples in Section 
6), the martingale difference hj{t) defined in (3) leads to the semimartingale 
representation of the transformed process gt{Yj), that is, 

n n 

(5) Y.9tiyj) = T.E(9t{Yj)\Tj^i) + Hnit), teT. 

A strong law of large numbers for martingales will entail Hn{t)/n -^ a.s., 
for every t ^T. This asymptotic equivalence ofJ2]=i 9tiYj) and J^j=i E{gt{Yj) 
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^j~i) together with the one-to-one correspondence between the density 
f{Yj\Yi, . . . , Yj^i) and E{gt{Yj)\Tj^i)^ for important kernels, supports using 
{Hn{t),J-n} as the basic martingale to generate the class Mt of estimating 
functions (4). 

2.2. Optimality considerations. The general theory of quasi-likelihood 
furnishes an optimal estimating function within Mt- Accordingly, the esti- 
mating function G* (t) G Ait given by 

n 

G:it)=Y,w*h,{t), 

i=i 
with 



w 



satisfies the small sample optimality criterion (OiT'-optimality) of maximiz- 
ing, for all 6, 

^ ' E{Gl{t)) ' 

and the asymptotic optimality criterion (0^-optimality) of maximizing, a.s., 
for all and all n > 1, 

(7^ [Y.U^,E{§,h,{t)\T,^,)f 



E^EiiwMtWl^j-i) ■ 

The estimating function G*^{t) is a quasi-score estimating function, and 
an estimator of 9 obtained from G^it) = is a quasi-likelihood estima- 
tor. A comprehensive explanation of quasi-likelihood concepts is available 
in Godambe and Heyde [15] and Heyde [19]. The quantity in (7), denoted 
by Iq^u-j, is the martingale information in Gn{t). Its maximum value, at 
Gn{t) = G;,{t), is given by 



n 



Iq, u\ occurs as a scale variable in the asymptotic distribution of the quasi- 
likelihood estimator of 9. For the score function S^, !$„ is the conditional 
Fisher information Yl^=i E{s'^j\Tj-i)- Note that for G*^{t), the quantity (6) is 
equal to E{Iq, (^j). Explicit forms of G* (t) and Iq* (j) in terms of the kernel 
function may be obtained in view of E{-^hj{t)\Tj-\) = —-^E{gt{Yj)\Tj-i) 
and E{h']{t)\T,.i)=\s.T{gt{Yj)\T,^i). 
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The choice of hj{t) as the martingale difference generates the family of 
classes M. = {A4t, t gT} with a corresponding family ^* = {G* (t), t € T} of 
quasi-score functions and a family {0^, t G T} of quasi-score estimators. We 
define a measure of conditional efficiency, effc(G* (i), 5^), for the quasi-score 
G* (t) € Ali relative to the score function Sn as the ratio lG^[t)/^Sn ■ Also, we 
define the conditional efficiency, effc(G* (ti),G* (^2)), of G*(ti) G Al^^ rela- 
tive to G*(t2) € Mt2 as the ratio lG*{ti)/lG*{t2)- The efficiency of a quasi- 
score function G* may be defined alternatively in terms of the information 
quantity associated with Oj?-optimality; see McLeish [26] and Merkouris 
[28]. However, because the quasi-score function G^ and its information Iq* 
are expressed in terms of conditional functionals only, it is easier to com- 
pute Ig* than its unconditional counterpart E{Ig*), which may not even 
exist (e.g., in models involving stable distributions). A justification for a 
nonasymptotic use of Ig* in measuring efficiency is provided in the next 
section. 

We can now choose the most efficient quasi-score function in t?* by max- 
imizing Iq* u\ with respect to t G T. Thus, we will have 

^G*{t) < -^Gj(t*) a.s., for some t* e T, 

and hence, 

effe(G:(t),5„)<eff,(G:(i*),5„). 

In general, the resulting estimator O^* will be adaptive, in the sense that 
the value of t* will be determined by the sample. In the usual case where 
-^G* (t) depends on the parameter 0, we may replace 9 in /G*{t) by an initial 
estimate and then proceed with the maximization; see Section 4. 

The information contained in the sample {Yj, 1 < j < n}, and carried by 
J21'=i9t(Xj) for a specified kernel, is spread throughout the range of values 
of t. The above procedure of choosing the most efficient member of t/* aims 
to minimize the loss of information resulting from choosing any particular 
value for t. We may well then use more points from the set T and extract 
the maximum information possible by judiciously choosing their values. This 
leads to the consideration of combining (in the sense of Heyde [18]) an 
arbitrary number of distinct transform martingale estimating functions. A 
distinctive advantage of the transform method is the ready capacity to form 
combinations of the form J2J=iJ2i=iWjihj{ti) = X]j=i Wjhj(t) [in obvious 
notation for the vectors Wj and hj(t)] by using an arbitrary number of 
points ti, . . . ,tfc G T, and thereby producing more efficient transform quasi- 
score functions. A procedure for constructing transform-based composite 
martingale estimating functions that are statistically and computationally 
efficient is described in the next section. 

Transform martingale estimating functions for a p-vector parameter 
are p-dimensional and, for a fc-vector hj(t), have the general composite 
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form G„,(t) = X]?=i'^ihj(^)i ^^ which the Wj's are px k weighting matrices 
depending on yi, . . . , Yj-i and 9. Suppressing t at the moment, the optimal 
Gn is G; = Ei=i w*hj, where w* = (^(hj|^j_i))'(£;(h,h;.|^,„i))-\ h^ = 



{E[-^hji)} and prime denotes transpose. G* maximizes, in the partial order 
of nonnegative definite matrices, the information matrix Ig„ = G^(G)~^G„, 
where {G)n = E]=iE[{wjh,){wjh,y\J'j_i] and G„ = E"=i w,£(h,|J-j_i). 
Both (G)„ and G„ are assumed to be, a.s., nonsingular for each n > 1. For 
G*, it holds that Ig* = {G*)n = Ei=i-E[(w*hj)(w*hj)'|J-j_i]. In analogy 
with a definition of efficiency, dating back to McLeish [26], which uses as 
measure of the size of the Op information matrix its determinant, we may 
define the conditional efficiency, effc(G*,S„), of G* as the ratio |Ig*|/|Is„|; 
elaboration on measures of efficiency based on the martingale information 
matrix can be found in Merkouris [28]. 

The procedure extends readily to multivariate observations. For r-dimen- 
sional random variables Yj = {Yji, . . . , 1^>), 1 < j < ?^, the kernels are mul- 
tivariate, with an r-dimensional index set, and are constructed as products 
of univariate kernels, that is, 

r 

(8) gtiY,) = l[gt,{Yji), t = (ti, . . . ,t,) € T^ 

1=1 

2.3. A link with a Fourier method for i.i.d. variables. An interesting 
link of the proposed method of estimation with an existing transform-based 
method for i.i.d. variables is established as follows. Using the more suggestive 
notation Sj = s{yj;J^j^i), the score function Sn = J21=i ^j can be expressed 
as 

n n „ 

(9) Yl s{yj;Tj-i) = J2 ^i(*) exp(ityj) dt, 



where 



'^j(*) = ^ / ^(^i; -^i-i) (^w{-ityj)dyj, 1 < j < n. 



is the inverse Fourier transform of s{yj]J-j-i). When the form of ujj{t) is no 
more tractable than that of s(l^;^j_i), the integral in (9) may be approxi- 
mated arbitrarily closely by a step function, say, ^i=iWj{ti) ey:.Y>{itiYj) , the 
coefficients Wji = Wj(ti) being functions of Yi, . . . ,i^-i and 6 as well as t. 
Then the score function can be written as 

n „ 

J2 s{y;T,^i)dF,{y) 
(10) 

n „ k 

= H Yw.jieyiY>{itiy)d[Fj{y) - Fj{y\Tj^i)], 
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noticing that the term involving Fj{y\J^j^i) is identically 0. In view of the 
transforms (1) and (2), the right-hand side of (10) can be written in the 
form 

n k 

(11) ^5]u;,,[exp(ifzy,) -^(exp(it;y,)|^,_i)]. 

i=i 1=1 

In the case of i.i.d. random variables, this approach leads to the generalized 
method of moments of Feuerverger and McDunnough [12, 13] based on the 
kernel gt{Yj) = exp{itYj) and appropriate weights Wji. A linear approxima- 
tion of the score function for general classes of kernels was considered in the 
i.i.d. case by Feuerverger and McDunnough [14], and more extensively by 
Brant [4]. In the stochastic process context this leads to the generalization 
of (11), 



(12) 



n k 

j=l 1=1 

n k k n 

j=ii=i 1=1 j=i 



which can be viewed as a combination of k estimating functions of the form 
put forward earlier in this section. Thus, Fourier transform methods of esti- 
mation and more general transform-based linear approximations of the score 
function can be incorporated into the general quasi-likelihood theory. 

3. Combinations of transform martingale estimating functions. In this 
section we develop a method of optimally combining transform-based mar- 
tingales into quasi-score functions of nondecreasing conditional efficiency 
(as defined in Section 2.2). The construction of such optimal composite esti- 
mating functions, which for the main transforms can attain arbitrarily high 
efficiency, is founded on an alternative formulation of the optimality of a 
martingale estimating function based on the concept of orthogonal projec- 
tion. 

3.1. Optimality and orthogonal projection. Consider first the space L^ = 
L'^{Q, J^, Pq) of (equivalence classes of) random variables on (fi, JF, Pq) which 
are square integrable (i.e., with finite second moment). Endowed with inner 
product {X,Y) = E{XY) and norm \\X\\ = (X,^)^^^ |.j^g ^^^^^ ^2 jg ^ 

Hilbert space. Let ^ be a closed subspace of LP'. For X ^ LP , let E*{X\A) 
denote the unique element in A such that 



\X - E*{X\A)f = inf IIX - Zf = inf E\{X - Z 



i2l 
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that is, E*{X\A) is the orthogonal projection of X on A. 

Next consider the class A4 of general martingale estimating functions of 
the form G„ = J2]j=i '^jhj , generated by a basic martingale {H„ = J2]j=i hji ^n}, 
with fc-dimensional martingale differences hj = (hji, . . . jhjk)' and with k- 
vector coefficients Wj = {wji,. . . ,Wjk) that are JFj„i-measurable functions 
depending on a scalar parameter 9. The class A4 is a linear subspace of 
functions spanned by the hj's. Henceforth, we will refer to the class A4 as 
a linear space, or simply a space. 

A projection representation of the optimal martingale estimating function 
in the space A4 that appeared first in Merkouris [27] is formalized here in 
the following lemma. For a Hilbert space approach to general estimating 
functions based on the notion of E- sufficiency, see Small and McLeish [34]. 

Lemma 1. The martingale quasi- score function G* = X]?=i w^hj, with 
w* = (£'(hj|^j_i))'(£^(hjh'|^j_i))~^, is the orthogonal projection of the 
score function 5„ onto the space M. . 

Proof. Let Aij denote the subspace of functions of the form gj = Wjhj , 
1 < J < J^- Since the functions hj are orthogonal, that is, £'(hjh') = for 
i^ j, the space M. is the direct sum of the subspaces A4j, that is, 

(13) M = Mi®---®Mn- 

Now consider the Hilbert spaces L^{Q,J^j,Pg), 1 < j < n, where Pg is the 
probability measure restricted to J-'j. Furthermore, consider the subspaces 
Bj C L^ (ri, J^j ,Pq) of all measurable square integrable functions of Yi , . . . , Yj 
with conditional mean zero and differentiable with respect to 6. Note here 
that the jth term of the score function, Sj, belongs to Bj and the assump- 
tions for M.J imply that Mj C Bj . 

Observe that since E{hj\Tj-i) = £'(sjh'|^j_i) a.s., under a mild reg- 
ularity condition w^ is the coefficient of hj in the orthogonal projection 
g* = Wjhj of Sj G Bj onto the subspace M.j C Bj. Specifically, denoting by 

11 ■ 11.7^,-1 the norm induced by Pg conditional on J^j-i, the element g* € A4j 
is such that 

M A\ II *ll2 -rll Il2 

for all E 0. The .Fj-i-measurable w* arises then as the solution of the 
system of k equations that express the orthogonality condition E[(sj — 
Wjhj)hj|J^j_i] =0. By orthogonality, we also obtain ||sj|||r_^ = ||5j||^,_-^ + 
ll'^j ~ •S'i Il3^ and, by passage to the sum, the decomposition of the condi- 
tional Fisher information Is„ = J2j W^jllj^^ ™to the (observed) information 
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of the quasi-score function Ig* = Ej Wdj III" _i = J2]=i -^[(w*hj)(w*hj)'|jr,_i] 
and the minimized sum of residuals y],- llsi — qjll'i- 

Now, since Sj and gj are elements of L'^{Vl,J-j,Pg)^ we can take the ex- 
pectation of ||sj — fi'jll^r. to obtain 

\\sj-9*j\?= inf \\sj-gjf 
gjGMj 

from (14), and so we can formally write g* = E*{sj\Mj). Moreover, using 
the linearity property of the projection operator, the decomposition (13) 
and the fact that the martingale differences Si and hj , i^ j, are mutually 
orthogonal, we have 

n n n 

j=i j=i j=i 

Therefore, the quasi-score function G* is the unique orthogonal projection 
of the score function Sn onto M . D 

A few remarks are in order. 

Remark 1. As expectation conditional on J'j-i, the inner product that 
induces the norm || • ||:r „i used in (14) is an element of the space of all 
Jv;_i-measurable functions, say, A^jf_j. Such an inner product, with the 
associated space of "scalars" being 7W:Fj_i , is well defined, its defining prop- 
erties holding a.s. For a comparable inner product used in a similar context, 
see Murphy and Li [29]. It is important to note here that if the ^^s are only 
conditionally square integrable (see Example 1 in Section 6), then the result 
of the lemma holds, but restricted to term- wise projection of the score func- 
tion onto the A^j's using the norm || • \\j^_-^ ■ In this case the Op information 
quantity E{Ig^) does not exist. 

Remark 2. For a p- vector parameter 6, each component Sji of Sj, i = 
1, . . . ,p, is approximated by its projection g*^ = w*jhj onto the same space 
Aij. More compactly, we write g| = w^hj, where the p x k matrix w^ = 
(w*i,...,w*p) is given by w* = (E(h,|^,_i))'(S(h,h;.|^,_i))-i. Both g* 
and Sj = {sji, . . . , Sjp)' are elements of the set L2 of random p- vectors with 
all components in L'^iyL^J^j^Pg), which is a Hilbert space when the inner 
product is defined to be (X, Y) = tr £;(XY') for all X, Y G L^. In this Hilbert 
space, the vector g^ can be characterized as the projection of Sj onto A^^, 
the p-fold Cartesian product of M.j with itself, and G* as the projection of 
S„ onto MP = M\®---® Ml. 
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Remark 3. Enlarging the space of martingale estimating functions in- 
creases the information of the corresponding martingale quasi-score function. 
For an increasing sequence of spaces {Mk} of martingale estimating func- 
tions with corresponding sequence of quasi-score functions {G* ;,}, we have 
G* f^_^_i = E*{Sn\.Mk+i) and, by a well-known projection property, 

that is, G* f^ is the orthogonal projection of G* ^._,_]^ onto Alfc, and {G* ^, A^a,.} 
is a "projection martingale." It follows easily that \\Sn — G* ;._^;^|p < \\Sn — 
G* j^lp. In view of (14), this also entails Iq* < Ig* a.s. 

Remark 4. Combining martingale estimating functions essentially in- 
creases the dimensionality of hj, for each j, and hence, of the space A4j 
spanned by its components. Thus, according to the previous remark, a closer 
approximation of the score function can be achieved. In the furthermost 
extension of the combination procedure, aiming for full efficiency, we con- 
sider the closure Mj of the infinite-dimensional space Aij spanned by a 
countable set {hji,hj2, ■ ■ ■} of martingale differences. The limits associated 
with A4j are in the norm || • ||:r_j. If the set {hji,hj2,- ■ ■} is complete in 
Bj C L'^{n,Tj,P^), that is, if Mj = Bj, for all l<j<n, then 

g* = E*{sj\Mj) = E*{sj\Bj) = Sj, 

and hence G* = Sn- This possibility is explored next using transform mar- 
tingale estimating functions. 

3.2. Optimal combinations of transform martingales. We assume, for 
clarity, that 9 is scalar and recall that the most efficient quasi-score func- 
tion G'^{t*) in the family Q*^ = {G* (t), t ^T} of quasi-score functions cor- 
responding to the family of distinct spaces M. = {A^t, i € T} is obtained 
by maximizing Iq* u\ with respect to t G T. The construction of composite 
martingale estimating functions with the use of more points t ^T involves 
increasing the dimension of the basic transform martingale and, hence, the 
dimension of the associated space of transform martingale estimating func- 
tions. The proposed procedure generates in a stepwise manner an increasing 
sequence of spaces of transform martingale estimating functions by retaining 
the optimal points i € T determined in the preceding steps. This facilitates 
the comparison of composite martingale quasi-scores from different spaces 
and ensures increasing efficiency with increasing number of points t G T. 

Proceeding formally, we write t^*, 1</<A; — 1, k >2, for the optimal 
points in T determined in the first k — 1 steps. At the kth step of the approx- 
imation, with a new point tky^t^, we will have the k x 1 vector martingale 
{Hn{tl,...,tl_,,t,) = {EUhM),---,EUhM-i)^El=ihj{tk)y,J'n}with 
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associated space A4t*,...,t* ,**. of composite martingale estimating functions 
of the form 

n n /k—1 \ 

with t^ fixed. Then the quasi-score function within A4t*,...,t* ,tk is given by 

n 

i=i 

withwj = {E{h.j\J^j^i)y {E{h.jh.'ATj-i))~^ , and the information in G* (t^ ,... , 
tl-i,tk) is 

n 

Clearly, A^j.,...,j._^ C A^t*,...,t*_^,t,, for any t^ € T, and since G* (t^, . . . ,tl_-^,tk) 
is the projection of 5„ onto Mt*^,.,^t* ,tk, we have lG*{t*,...,t* ) ^ 
-^G*{t*,...,t* ,tfe) ^-S- for any t^ € T (by Remark 3, Section 3.1). Prom the 
family {G* (tj, . . . ,i^_]^, tfc),ifc S T} we can now choose the most efficient 
quasi-score function by maximizing Iq* u* j. ^ \ with respect to tk € T. 
Thus, for some f^&T, we will have 

n k 

(15) G:(t^...,tD=i?*(5nlM^...,t*)=EE^l(*o^J•(*^' 

satisfying 

(16) ^G* {t*,...,t*._j) < -^G*{t*,...,i*_^,tfe) ^ -^G'{t*,...,t*._j,t*) a.s., 
and hence 

effc(G* (ti, . . . ,4_i), S'n) < effc(G* (t*, . . .,tl),Sn) a.s. 
Moreover, 

(17) effe(G:(tt,...,t^),5„)<eff,(G:(il,...,tD,G':(it,...,t^+J) a.s. 

It is worth noting that the inequality (16) provides a nondecreasing se- 
quence of lower bounds for the score information Is„ , while the inequality 
(17) places an upper bound to the unknown efficiency of G^itl, . . . , t^). The 
procedure of adding components to G* (tj, . . . ,ifc) may be terminated when 
-^G* {<*,.. .,t* ,t*) ceases to increase substantially with k or when computations 
become prohibitive. 
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The composite quasi-score function in (15) provides a finite approxima- 
tion to the score function 5„ in L^(Q,J^, Pg). We may consider next an ar- 
bitrarily close approximation of 5„ by projection onto infinite-dimensional 
spaces of transform martingale estimating functions. Thus, for each j, let 
{hj(ti),hj{t2),- ■ ■} be a countable set formed by evaluating the martingale 
difference hj{-) at distinct points ti,t2, ... in T. Let A4j-ti,t2,... be the closed 
linear subspace spanned by this set. M.j-ti,t2,... consists of those zero- mean 
elements of L'^{il,J-j,Pg) which can be approximated in the norm || • \\j^_-^ 
by finite linear combinations of elements of {hj{ti),hj{t2), ■ ■ ■} with J'j-i- 
measurable coefficients. Note here that a set complete in L^{Q,J^j,Pg) is 
characterized by the property that L'^{Q,J-j,Pg) contains no nonzero func- 
tion orthogonal to all elements of the set (see Burrill [6], page 216). It follows 
then that the set {hj{ti),hj(t2),- ■ ■} is complete a.s. in the subspace Bj C 
L?{VL,T-j,P^q), that is, Mj-tut2,... = Bj a.s., if the set {gt-^{Yj),gt^{Yj), . . .} 
is complete in L^{Q,J^j,Pg) in the norm || • ||. This leads to the following 
theorem. 

Theorem 1. Suppose that for a kernel class of functions {gt{Y), t G T} 
the countable set {gti{Yj) , gt2(Yj) , . . .} is complete in L^(^2,.7^J■,Pg) for all j . 
Then the score function Sn can be approximated in L'^{Q,J^,Pg) arbitrarily 
closely by the transform quasi-score function G* (ii, . . . , tfc) with k sufficiently 
large. 

Proof. Since Sj £ Bj, the completeness of the set {gti{Yj),gt2{Yj), . . .} 
in L?'{n,J^j,Pg) implies Sj G Mj;ti,t2,... ^-S- Then Sj is a limit point in 
■^j]ti,t2,... and, thus, for each e > there exists a finite combination gj{ti,t2, 
. . . jtx) = ^i=iWjihj{ti), an element of the subspace M.j-^ti,t2,...,tK^ fo^' which 
\\sj - gj{ti,t2, . . . ,tK)\\%^_^ <e a.s. But by (14), 

\\sj - 9*j{ti,t2, ■ ■ ■ ,'tK)\\%^_^ < \\sj-gj{ti,t2,...,tK)\\'jr^_^ <e, a.s., 

for the projection S'*(ti,t2, . . . ,t/^) = E^i w*ihj(ti) of Sj onto Mj-ti,t2,-,tK- 
Now, by Remark 3, Section 3.1, 

\\sj - gj{ti,t2, . . . ,tK+i)fr,_i - W^j - 9j{ti,t2, ■ ■ ■ ,tK)f^^_^ a.s., 

and hence 

\\sj — g*(ti,t2, ■ ■ ■ ,ifc)|||r._j < e a.s., for all k> K. 

This implies 

\\sj - 9j{.'ti,t2,---,tk)\\'^^j_-, —'O a.s., as /o^oo. 



TRANSFORM MARTINGALE ESTIMATING FUNCTIONS 15 



and by the decomposition ||sj|||r _^ = Ibj (^1,^2; • • • ,tk)\\'jr._^ + \\sj - g*{ti,t2, ..., 
tk)\\'jr. it also implies 



(18) \\9*j{h,t2,...,tk)\\%_^^\\sjf^^_^ a.s. 

By the (unconditional) square integrability of the elements Sj and gUti,t2, . . . ,ifc) 
of L'^{Q,!Fj,Pg), we have 

\\sj -g*{ti,t2,...,ti,)f ^0 as k^oo, 



or, equivalently, 



It follows that 



Sj = g*j (ti , t2, ■ • ■) = XI '^*ii^J (*' 



1=1 



Sn = Gl{h,t2,...)=Y.ll^]Mtl)^ 

where G*(ti,i2; ■ • ■) is the quasi-score function within the space M.ti,t2,... = 
-^i;ii,t2,.--® ■ ■ ■®-^n;ii,t2v' ^^^^ is, the Unique orthogonal projection E*{Sn\ 
-^ti,t2,...)- Therefore, the score function 5„ can be approximated arbitrarily 
closely by the partial sums G* (ti , . . . , tk) = Z)j=i Z]/=i '^*ji^j i'^i)- 
Moreover, by (18), 

^Sr.=}^lG*„{h,t2,...,t,) a.s., 

so that limfc^ooeffc(G'*(ti,i2,---,ifc),5'n) = 1 a.s. D 

Remark 5 . Completeness ensures full efficiency of the transform quasi- 
score function G* (ti,i2) • • ■), but is not necessary. It would suffice that Sn G 
-^ti,...,tfe for some k and ti, . . . ,tfc in T, or 5„ G A^j^^jj^.,,. The completeness 
of kernel sets for the main transforms is shown in Feuerverger and McDun- 
nough [14]. 

Remark 6. In the case that the probability measures Pq are concen- 
trated on a finite set of points, say, {ai,. . . ,aj\f}, the spaces L'^{Q,Tj,P0) 
are A^-dimensional, and then any finite set {gt^ , ■ ■ ■ , 5tjv} of linearly indepen- 
dent kernel functions will be complete. Linear independence of the kernel 
functions requires that the vectors {gti{ai), . . . ,gti{aN))' ■, 1 <l < N, he in- 
dependent for some values of ti, . . . , ^tv. Any such choice of values is optimal 
and yields G* (ti, . . . ,tj\f) = Sn- 
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Remark 7. In general, full efficiency may be achieved with a finite 
number of points in T for some terms of 5„, that is, Sj = J2i=i'>^^ihj{ti), if 
Sj € Mj-ti,...,tk for some j. In particular, Sj = w*Ahj{t) may hold for some t G 
T when the conditional density f{Yj\Yi, . . . , l^-i) belongs to the exponential 
family. This can occur, for example, in applications to aggregate Markov 
chains. In such situations projecting Sj onto the larger space of functions of 
the form (7j(ti,t2) = Wjihj(ti)+Wj2hj(t2) results in Sj = w%hj{ti)+0-hj{t2) 
and Ig*{ti,t2) — ^g*(ti)- I* is not difficult to show that this is equivalent to 

which can be written as 



-E{gt,{Y,)\T,^^)Y^T{gt,{Y^)\T,^i). 



d_ 

'Be' 

This condition is necessary for /(Yj|Yi, . . . ,ij-i) to belong to the exponen- 
tial family with gt^iYj) being the sufficient statistic for 6. If, as in some 
situations, the sufficient statistic is gt^iYj) =Yj^, ti = 1, then 



^ ^jMti) = - 'LXaJ , [y, - E{Y,\^,~i)]- 



\w{Yj\Tj_i) 

The multiparameter case can be treated in a similar manner. It is impor- 
tant to note that, since the sequence of the constructed spaces is increasing, 
a matrix version of the inequality (16) holds in the partial order of nonneg- 
ative matrices. This enables us to choose at the A;th step of the procedure 
the optimal quasi-score G*(iJ, . . . ,t^_]^,tfc) by maximizing the determinant 
of the matrix Iq* (**,...,** ,tk) with respect to tk- In the case of multivari- 
ate (say, r-dimensional) observations, the A; x 1 vector martingale difference 
hj = (/ij(ti), . . . ,/ij(tfc))' involves k r-tuples in T"^ . Apart from the com- 
putational difficulties associated with the choice of appropriate values of 
ti, . . . ,tfc for large r, the described methodology carries over to this case in 
a straightforward manner. 

4. Kernel classes and computational issues. The statistical and compu- 
tational efficiency of the transform method depends on the associated kernel, 
the number k of points ti, . . . ,tk, and the choice of the values of these points 
at which the kernel and its conditional expectation are evaluated. 

Conditional transforms conveniently describing stochastic process models 
are, commonly, conditional versions of the characteristic function, the mo- 
ment generating function, the Laplace transform and the probability gener- 
ating function. The sequence of conditional moments can be derived from 
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these transforms. As Brant [4] notes in the context of i.i.d. variables, the 
corresponding kernel classes are closed under multiplication, that is, 

(19) gt{Y)gs{Y) = g,(t,s){y) for all t, s in T, 

with a multiplication rule v.T x T ^T defined by (19) for any particular 
class. For instance, in the case of the moment generating function kernel 
gt{Y) =exp(ty), we have 5t(^)fl's(^) =gt+siY), for all t, s in T. The im- 
portant practical consequence of the closure property of the kernel classes is 
that the transform Cj{t) = E{gt{Yj)\Tj-i) yields the joint moment structure 
Cj{v{t, s)) of the kernel class, for 1 < j <n. Thus, the matrix E{hjh.'AJ^j-i), 
with hj = {hj{ti),. . . ,hj{tk)y , can be readily obtained since its entries are 
of the form 

E{hjiU)h,iti)\J',.^) = Cov{guiYj),gt,{Y,)\Tj.^) 

= Cj{v{ti,ti)) - Cj{ti)cj{ti), i 7^ I, 

for any of the aforementioned kernels, except for the kernel gtiY) = exp{itY) 
of the characteristic function for which closure is under complex conjuga- 
tion and Cj{v(ti,ti)) — Cj{ti) = Cj{ti — ti) — Cj{—ti). Therefore, optimal com- 
binations of transform martingale estimating functions can be readily con- 
structed. The property of closure of the kernel classes under multiplication is 
preserved in multivariate kernels in an obvious way, in view of their defining 
property (8). 

A special feature of the characteristic function with kernel class {gt{Y) = 
ex.p(itY),t G M} merits attention. Notwithstanding the notational conve- 
nience associated with using the complex valued kernel, for computational 
purposes it is preferable to work with the class of real valued kernel vec- 
tors {(cos(iy),sin(ty)),i G M}. Thus, we may start with writing the basic 
martingale difference hj{t) = gt{Yj) — E{gt{Yj)\Tj^i) in the complex domain 
form 

hj{t) = cos{tYj) - £;(cos(tYj)|J^j_i) + i[sm{tYj) - E{sm{tYj)\J='j^i)]. 

Then, for the real valued vector g{Yj) = (cos(iil^-), . . . , cos{tkYj),sin{tiYj), . . . , 
sin(tfcY^))', the martingale difference vector is hj = (Rehj{ti), . . . ,Rehj{tk), 
Im/ij(ti), . . . ,Im/ij(tfc))'. As in the i.i.d. case (e.g.. Brant [4]), a conve- 
nient choice of kernel vector is g{Yj) = {cos{tYj), . . . , cos{kTYj) , sm{TYj) , . . . , 
sin(/crYj))', for some r, which provides an approximation to the score func- 
tion by a trigonometric polynomial. 

For specified kernel and fixed k, the transform method requires the choice 
of the values of {f/}. The most convenient approach is to choose the values 
a priori with a uniform spacing suitable to the particular problem. This 
has been tried with other transform methods in the case of i.i.d. variables 
and in connection mainly with the characteristic function; see, for example, 
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Epps and Pulley [9] . The efficiency of the procedure may vary with different 
uniform spacings. In the present context, an optimal uniform spacing could 
be sought by maximizing the martingale information with respect to the 
distance, say, r, between the points. 

It may be noted parenthetically that the sequence of moments, with kernel 
class {gtiy) = y^, t = 0, 1, 2, . . .}, has the distinct advantage over the other 
transforms that there is a natural choice of points {t;} as the first k positive 
integers. The first k should normally be chosen since the variability of the 
sample moments increases with their order. For g(^) = {Yj,Y,^,. . . ,Y-), 
the matrix ^(hjh'|^j_i) involves moments up to order 2k. The transform 
quasi-score function based on integer moments is a polynomial and, thus, 
its efficiency depends on how close the terms of the score function are to a 
polynomial. We will be denoting the polynomial quasi-score function based 
on the first k moments by G'^{1, . . . ,k). 

In general, for small k an optimal arbitrary spacing may result in sharp 
improvement in the efficiency. The values of {ti} can be chosen so that 
they maximize the martingale information. This can be taken as the "op- 
timal choice" rule with respect to efficiency. However, as k increases, so do 
the computational complexities of optimizing a fc-dimensional surface and 
inverting E{hjh'AJ^j^i). It is more convenient, following the procedure de- 
scribed in the previous section, to form the increasing sequence of spaces 
-^t*,...,t* Q '^t*,...,t* ,tfci ^ ^ 2, holding the first k — 1 points fixed at the 
values ti, . . . ,ifc_i, and then to maximize lG^{t*,...,f ,tk) with respect to t^. 
Maximizing the information with respect to one point at a time may result 
in some loss of efficiency, which in many examples becomes negligible as k 
grows larger. 

The appropriate values of {t/} will generally depend on the unknown pa- 
rameter. We propose a two-step approach in which we start with Iq* u\ eval- 
uated at some preliminary estimate of the parameter (e.g., conditional least 
squares estimate), and at the kt\\ stage of the approximation we evaluate 
-^G* (t*,...,t* ,tfe) ^-t the value of the estimate obtained from G* (tj, . . . , t*^_i) = 
0. Then we use the value of the optimal point t\ to solve the estimating 
equation G* (tj, . . . ,t^_i,t^) = 0. The solution is now the updated estimate 
in lG*(t' ,...,t' ,tfe)5 and a new value of the optimal point t^, can be deter- 
mined. The iteration continues until convergence to some value of the opti- 
mal point. This value of the optimal point is used in the estimating equation 
G* (if, . . . , t^_^, tp = to obtain the transform quasi-likelihood estimate and 
the value of the information quantity. This iterative scheme converges more 
rapidly as the number of points {t/}, and hence, the efficiency of the esti- 
mating function, increases. In light of this, as more points t are introduced 
the iteration may be stopped after the first step to ease the computations. 
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5. Comparing transform estimating functions. In this section we com- 
pare the polynomial quasi-score function, based on moments, with quasi- 
score functions based on other important transforms, namely, the character- 
istic function (CF), the moment generating function (MGF) and the prob- 
ability generating function (PGF). This comparison is of theoretical and of 
methodological interest. 

5.1. Comparing G* (ti, . . . ,tfc) with G^{1, . . . ,k). A relationship of the 
CF, MGF and PGF quasi-score functions with the moment (M) quasi-score 
function is established by the following proposition. 

Proposition 1. Assume that E{Y^''\J^j^i) exists. Then for both the CF 
transform, with the complex valued kernel exp{itY), and the MCF transform, 
the following relationship holds: 

hm GUti,...,tk) = GUl,...,k). 

max||t;||— >0 

For the PCF transform, 

lim G:(ti,...,tfc) = G':(l,...,/c). 

max{|t; — 1|}-^0 

Proof. For k = 1, on the assumption that differentiation with respect 
to t and expectation operations can be interchanged in 



Gn{t) = - 1 k^^^M^^atiX,) - E(3t{Ym~^)l 



- Var(5i(y,)|^,_i) 
taking the limit and applying I'Hospital's rule (twice) yields the limit 



o;.(i)=-|:|^^K-^«i^,-.)i. 

Alternatively, a first-order Taylor series expansion of gtiYj) gives a rep- 
resentation of G5^(t) in terms of G* (1) as 



(20) 




7 = 1 


where 




Kj = 


^E{Y,\Tj^,)\e.T{Y,\T,^r)[Y^'-E{Y^'\T,^, 




+ 


- r\ 

-i?(1^2|J-,_i)Var(y,|.F,_i) 
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2^^(^.l-^i-l)Cov(y„1^2|^,„l) 



x[Y,-E{Y,\T,.,)] 



'2\Vai{Yj\Tj^ 



If t is small and the o{t) terms are neglected, then the quasi-score G^{t) 
appears as a pertm'bed version of G* (1). We have also 



(21) 
where 



lGl{t)=lGl{l)+tY.^3+o{t)^ 



L, 



-,E{Y,\T,.i) 



\&?{Yj\Tj^i) 



-E{Yf\T,.,)\^T{Y,\T,.,) 
-^E{Y,\Tj.i)Cov{Y„Y^'\:F,^,) 



For k>2, the results can be obtained similarly with the help of a symbolic 
computation package. D 

Note that for the CF transform with the real valued kernel (cos(ty), sin(iy)) 
the limiting quasi-score is G* (1, . . . , 2k). li m < k points t tend to zero (or 
to one for the PGF transform), the limiting quasi-score function involves a 
kernel vector whose corresponding m components are the first m moments 
(or the first 2m moments for the CF transform). 

According to the above relationships, the CF, MGF and PGF transform 
methods are essentially equivalent to the M method for values of ti, . . . ,tk 
very close to zero (or very close to one for the PGF transform). In general, 
such choice of values is not optimal and a larger k may be required for the M 
method to achieve the same level of efficiency as optimal, or nearly optimal, 
CF, MGF and PGF methods. This is because the moments may not carry 
as much information as the other transforms; a relevant heuristic argument 
is given in Kiefer [22]. We consider next a situation where the M method 
may be the most efficient. 

Proposition 2. For k = 1 and kernels gt{Y) = exp(zty) and gtiY) = 
exp(ty), the necessary condition that 



(22) 
is 



max/G*(t)=lmi/G.(i) 



t^o 



^G*(l)) 



d 



d 



(23) ^i5;(l^^|.F,_i)Var(y,|.F,_i) - ^i?(y,|.F,_i) Cov(y„i;^|.F,_i) = 0, 



d9 
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for all j . An equivalent condition is 

(24) ^Var(y,|^,_i)-A,i^i5;(y,|^,^i)VarV2(y^.|^^.„^) = 0, 

where Xji = E[{Yj - E{Yj\Tj^i)y'^\J='j^i]/Yar^^^(Yj\J='j^i) is the index of 
skewness for the conditional distribution of Yj. An analogous result holds 
for the kernel function gt{Y) = tX when t approaches 1. 

Proof. We obtain, with the help of a symbohc computation package, 



dr ^wE{YA^3-i)\d 



f3.TMr.it) =Y1 



t~>G dt 



^,Var^(y,|^,_i) 



^^E{Yf\:F,.i)\aT{Y,\T,.,) 

E{Y,\:Fj_i)Co^{Yj,Yf\T,^ 



de 

d 



de 

Note that this hmit is the coefficient of t in the expansion of lG*{t) in (21)- 
Now set hnit^o ^^G^it) = 0- This is equivalent to (23). Then the result fol- 
lows on checking the sign of the second derivative. The equivalence of (23) 
and (24) is easy to prove. D 

Thus, the condition (23) is necessary for the quasi-score function G* (1), 
based on the first moment, to be more informative than the CF, MGF and 
PGF quasi-score functions based on a single point t. When Var(Y^|^j_i) 
is independent of 9 for all j, as in stationary autoregressive processes, it 
follows from (24) that a necessary condition for (22) to hold is that Aji = 
for all j. This occurs when the conditional distribution of Yj is symmetric 
around its mean. Although the form of this distribution is supposed to be 
unknown, its symmetry can be easily checked — the characteristic function 
of Yj — E{Yj\J^j^i) must be real and even (see Rao [31], page 142). 

5.2. Choosing transform quasi-score functions. Estimation using the es- 
timating function G^(l) is the martingale version of what has been de- 
scribed by Wedderburn [36] as quasi-likelihood estimation in the context of 
independent observations. Notably, G* (1) is the exact score function for ex- 
ponential family distributions with linear sufficient statistic (Remark 7, Sec- 
tion 3.2). Wedderburn's quasi-likelihood has been discussed in the context 
of discrete semimartingales in Hutton and Nelson [21], Godambe and Heyde 
[15] and S0rensen [35]. It should be noted that when a'j = Var(Yj|^j_i) 
is independent of 0, G*(l) is the same as the weighted conditional least 
squares estimating function obtained by minimizing the sum of squares 
J2]=i[Yj — EiYjlTj-i)]"^ /a'j with respect to 6. When a'j is also constant over 
all j, G*(l) reduces to the conditional least squares estimating function of 
Klimko and Nelson [23]. 
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Although G* (1) will be globally optimal within the class of linear esti- 
mating functions based on the semimartingale representation of a stochastic 
process in an exponential family setting, it will typically be suboptimal 
within a larger class of estimating functions for nonexponential families. 
Then superior estimating functions may be readily constructed based on 
the semimartingale representation (5) of the transformed process. 

First, consider polynomial estimating functions. A quadratic quasi-score 
function, essentially equivalent to G* (1, 2), has been considered by Godambe 
and Thompson [16] as an extension of the ordinary quasi-score G^i^) incor- 
porating possible knowledge of the skewness and kurtosis of the underlying 
distribution. The formulation in Godambe and Thompson [16] involves a 
more general conditioning than the martingale structure, and the two com- 
ponents of the quadratic estimating function are the first two central mo- 
ments corrected to zero mean and orthogonalized. A version of G*(l,2) for 
i.i.d. variables had been considered earlier by Crowder [7]. 

Recall now from Remark 7 in Section 3.2 that (23) is the condition for 
the coefficient Wj2 in the quasi-score function G* (1, 2) to be zero. Therefore, 
G* (1) is as efficient as G* (1, 2) if the condition (23) holds for all j. Otherwise, 
when Var(l^|^j_i) is independent of the parameter 6, it is not difficult to 
show that G*(l,2) reduces to G*(l) if the skewness is zero, and that the 
quasi-score function G*(l,2,3) also reduces to G,*(l) if both skewness and 
kurtosis are zero. In situations different from those mentioned, G* (1) is less 
efficient than G*(l,2). In fact, it is less efficient than a simple nonlinear 
transform quasi-score function [perturbed version of G* (1)] that is based on 
a single optimal point t. Of course, only nonpolynomial martingale quasi- 
score functions are applicable if the variables Yj have no finite (conditional) 
moments, for example, if Yj has a stable distribution; see Example 1 in the 
next section. 

A composite transform quasi-score function utilizing more than a single 
point t will generally be more efficient. We may also combine different trans- 
forms. For example, we may choose g{Yj) = {Yj,Y^, ■ ■ ■ ,Yj ^ ,exp{tiYj), . . . , 
exp(tfc2^))') with optimal points tfcj+i, . . . ,tfc2i thereby combining the con- 
venience of the M method with the higher efficiency of the MGF method. 

Composite quasi-score functions may be also effective in dealing with 
situations in which the estimating functions for a vector parameter in alter- 
native methods (e.g., conditional least squares) are functionally dependent 
and, thus, no estimates can be obtained; see Example 2 in the next section. 

6. Examples. The following two brief examples illustrate important fea- 
tures of the proposed method. A more detailed study of these applications 
will be reported elsewhere. 
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Example 1. AR{1) process with symmetric stable error. Consider the 
AR(1) process 



Yj = (/)Yj^i + e 



J' 



where {ej} is an i.i.d. sequence of symmetric stable random variables with 
characteristic function c(i) = exp(— |t|"), < a < 2. Closed form density 
representation exists only in the Cauchy (a = 1) and Gaussian (a = 2) cases. 
We wish to estimate (j) on the basis of a sample Yi , . . . , y„ . Consider the 
martingale difference 

hj{t) = exp{itYj) - E{exp{itYj)\Yj^i) = expiitYj) - ex.p{it(j)Yj_i - |t|°). 

Then the transform quasi-score function constructed using the real kernel 
{cos{tYj),sm{tYj)) is Gl{t) = E"=iW*hj, where h^ = {Rehj{t),lmhj{t)y 
and w* = {E{hj\Yj^i)y{E{hjh'j\Yj^i))-^. Noticing that {E{hj\Yj^i)y = 
{■^Rehj(t), ■^lm.hj{t)y , and using properties of the cosine and sine func- 
tions to derive the entries of (£'(hjh' |Y^_i))~^, we can show that 

GUt) = „ ,, , ' ,\ r y y,-isin(t(</.y,_i - Y)) 

"^ ^ exp(|t|°)(exp(2"|t|")-l) ^ ^ v vv^ j i jjj 



and 



2t2exp(2°|t|") 



^^ ^"(*) exp(2|t|°)(exp(2°|i|")-l)^^ ^-1' 

It is important to note that because of the infinite variance of the Yj's, 
the Of information quantity E{G^ (t)), which is equal to E{lQ*(j^y) in the 
finite variance case, is not defined in the present case, implying that the 
Oi?-optimality criterion is not applicable. However, Iq* u\ < oo for all n > 1 
and given t gT. 

Turning now to the choice of the optimal value of the point t &T, we 
observe that this choice is independent of the parameter (p and the obser- 
vations. We also observe in (25) that the factor multiplying J21=i ^?-i ™ 
Ic'it) is an even function of t. We consider this factor for i > to obtain 
its generalized series expansion 2^-"t2-Q _^ 2i-"(2"-i - 2)t^ + o{t"'+^). It 
follows that 

r 0, if < a < 2, 

]^,lGm = \ 1/2 J2Y^_„ ifa = 2. 

In the case of a normal distribution (a = 2), lG*{t) < '^/'^Yll=i^j'~i — 
WvLit—yQlcu) (= -^5n)i that is, Icu) attains its maximum at the origin [in 
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Table 1 
The efficiency of G^it*) 



a 


Fisher information (i.i.d.) 


lot 


u* *)/!]"= 




t* 


effc 


(g: (**),««) 


2.0 


0.500 




0.500 




0.00 




1 


1.9 


0.473 




0.469 




0.3852 




0.991 


1.7 


0.442 




0.428 




0.4767 




0.968 


1.5 


0.428 




0.391 




0.5384 




0.913 


1.3 


0.431 




0.358 




0.6087 




0.831 


1.1 


0.463 




0.332 




0.7148 




0.717 


1.0 


0.500 




0.324 




0.7968 




0.648 


0.8 


0.678 




0.330 




1.1022 




0.487 



accordance with (22)]. For a <2, lG*(t) attains its maximum at a value of 
t > 0; the smaller the value of a, the larger the optimal point t* . 

We can assess the efficiency of G^{t*) for varying a by comparing the 
factor multiplying Yl^=i'^j'-i i^^ lG*(t*) with the essentially exact Fisher 
information (per observation) for the location parameter, say, (j), computed 
by DuMouchel [8] in the setting of i.i.d. variables for selected values of a. 
This is presented in Table 1. The optimal values for t in each case are also 
shown. 

It can be seen that for the range of values of a reported in Table 1 the 
efficiency of G* (t*) decreases as a deviates from a = 2. The efficiency of the 
transform quasi-score function can be increased by introducing more points 
t, albeit at the expense of increased computational complexity. 

Example 2. A first- order gamma autoregressive model. Consider the 
first-order gamma autoregressive model (Sim [33]) given by 



(26) Yj = a*Yj_i + e 



J' 



where the operator * is defined as a*Y = X]j=i -^ii ^^'^ (i) the Sj are 
i.i.d. Gamma(a, u) random variables with a, i^ > 0; (ii) the Ei are i.i.d. 
exponential (a) random variables; (iii) for each fixed positive value of y, 
N[y) is a Poisson random variable with parameter A =pa, and < p < 1. 

Expression (26) is an autoregressive representation for a stationary gamma 
process whose joint density function has a certain type of Laplace trans- 
form. This process has been used in the study of stochastic reservoir systems 
with Markovian inflows; see Sim [33] and references therein. The conditional 
Laplace transform of Yj can be easily derived (Sim [33]) and is given by 

(27) i?(exp(-sy,)|yj_i=|/,_i) = f^yexp^ ^""^^"^ 



a + s J \ a + s 
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Inversion of (27) gives the conditional density of Yj as 

/ ay \(^~i)/2 
fiVjlyj-i) = « K / ) exp[-(ayj + Xyj^i)]I^^i[2{Xayjyj^i) ' ], 

where Ir{z) is the modified Bessel function of the first kind and of order 
r. The hkehhood function being comphcated, Sim suggested (but did not 
apply) the conditional least squares method for the estimation of the three 
parameters of the model. The conditional least squares equations are 

n 
n 

^(Ay,_i + u)[aY, - XYj.i -u]=0, 



Y,[aY,-XY,.,-u]=0. 
i=i 

This system of equations has no solution other than the trivial solution 
A = a = i^ = 0. The same is true for the quasi- likelihood equations G* (1) = 0, 
and for the quasi-likelihood equations G* (s) = based on the Laplace trans- 
form (27) and employing a single point s. All three systems of equations men- 
tioned above have a solution if one of the parameters is fixed. In particular, 
if we consider the parameter v fixed, then maximum likelihood estimation 
is also possible, though very complicated. 

Estimation of all three parameters of the model is possible by using a 
composite quasi-score function based on the Laplace transform (27). Condi- 
tional moments of any order may be obtained from (27) in simple form; see 
Sim [33]. Then the convenient composite quasi-score function G*j(l,2) can 
be used. The corresponding system of quasi-likelihood equations, shown in 
simulations to have a numerical solution, is 

n 

^[5A^y/„i + {-6aYj + lOz^ + 4)A2y/„^ 



+ {a'^Y^^ - 6((1 + ij)aYj -v- z^^))Ay/„i 



,2n„,v. , ,,2 , ,,3x 



+ {{-V - v')aYj + u' + v'')Yj^i] = 0, 

n 

^[-3A^y/_i + {2aYj - mv - 2)X^Yf_^ 

3=i 



2a^2 I ci,,„,\r a,, 10,,2^\2^^2 

2 I ,,3\„,v- ,,3 ,,4i 



+ (a^y/ + GvoYj -6u- l2u^)X%Li 



{6u - 5aYj)u{l + u)XYj.i + (i/^ + u'^)aYj - W^ - v^] = 0, 
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n 

Y,[Q^%ti + ((25/2)z^ + 5 - 8ayj)A2y/_i 
i=i 

+ (2a2y/ - 9(1 + i^)aYj + 8i^ + 8u'^)XYj^i 

+ (l/2)i/y/a2 - 2i/(l + i/)ayj + {3/2)u'^{l + i/)] = 0. 
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